Sprint 3 Week 9 Plan: Medium File Refactoring (Services Layer)
Date: 2025-11-05 Last Updated: 2025-11-09 Sprint: Sprint 3 - Medium File Refactoring Week: Week 9 (Batch 3A: Services) Duration: 1 week Priority: 🟡 P2 - Medium Technical Debt
Executive Summary
Sprint 3 Week 9 focuses on refactoring 8 service files (300-500 lines) by extracting long functions (>50 lines) and adding error handling. Unlike Sprint 2's file splits, this sprint uses function extraction to break down complex methods while keeping files intact.
Goal: Eliminate all functions >50 lines in Batch 3A service files
Approach: - Extract helper methods from long functions - Add try/except blocks for risky operations - Improve logging for debugging - Maintain 100% backward compatibility
Files: 8 services files (3,082 lines total)
Sprint 3 Overview
Sprint 3 Structure
Sprint 3 Week 9 (Batch 3A: Services) - 8 service files - Focus: Extract long functions, add error handling - Pattern: Function extraction
Sprint 3 Week 10 (Batch 3B: Data & Database) - 7 data/database/client files - Same approach: Function extraction + error handling
Total Sprint 3: 15 files (~5,500 lines)
Batch 3A: Services Files (Week 9)
Files Summary
| # | File | Lines | Long Functions | Longest | Approach |
|---|---|---|---|---|---|
| 3.1 | family_league_inference.py | 434 | 3 | 78L | Extract inference helpers |
| 3.2 | logo_generator.py | 322 | 1 | 99L | Extract image processing steps |
| 3.3 | match_debug_logger.py | 459 | 1 | 181L! | Extract Excel sheet writers |
| 3.4 | match_suggestions.py | 382 | 1 | 56L | Extract similarity calculation helpers |
| 3.5 | provider_config_manager.py | 474 | 3 | 119L! | Extract cache/DB helpers |
| 3.6 | provider_orchestrator.py | 394 | 1 | 89L | Extract provider processing steps |
| 3.7 | scoped_team_extractor.py | 313 | 1 | 94L | Extract regex matching helpers |
| 3.8 | enhanced_match_cache.py | 304 | 0 | 42L | Add error handling only |
| Total | 8 files | 3,082 | 11 | 181L | Function extraction |
Detailed File Analysis
Task 3.1: family_league_inference.py (434 lines)
Current State: 434 lines, 3 long functions
Long Functions Identified:
1. _infer_from_event_context(): 78 lines (317-394) - Event context inference
2. _infer_from_teams(): 74 lines (242-315) - Team-based inference
3. infer_leagues(): 63 lines (111-173) - Main inference coordinator
Refactoring Approach:
For _infer_from_event_context() (78 lines):
# Current: 78-line monolith
def _infer_from_event_context(self, teams, payload):
# ... 78 lines of event context matching ...
# After: Extract 3 helpers
def _infer_from_event_context(self, teams, payload):
season_matches = self._extract_season_info(payload)
tournament_matches = self._extract_tournament_info(payload)
competition_matches = self._extract_competition_info(payload)
return self._merge_event_candidates(season_matches, tournament_matches, competition_matches)
def _extract_season_info(self, payload): ... # 20 lines
def _extract_tournament_info(self, payload): ... # 20 lines
def _extract_competition_info(self, payload): ... # 20 lines
def _merge_event_candidates(self, *candidates): ... # 15 lines
For _infer_from_teams() (74 lines):
# Current: 74-line monolith
def _infer_from_teams(self, team1, team2):
# ... 74 lines of team matching ...
# After: Extract 2 helpers
def _infer_from_teams(self, team1, team2):
team1_leagues = self._find_team_leagues(team1)
team2_leagues = self._find_team_leagues(team2)
return self._intersect_league_candidates(team1_leagues, team2_leagues)
def _find_team_leagues(self, team_name): ... # 30 lines
def _intersect_league_candidates(self, leagues1, leagues2): ... # 25 lines
For infer_leagues() (63 lines):
# Current: 63-line coordinator
def infer_leagues(self, family, team1, team2, payload, provider):
# ... 63 lines of orchestration ...
# After: Extract validation helper
def infer_leagues(self, family, team1, team2, payload, provider):
if not self._validate_inference_inputs(family, team1, team2):
return []
# ... rest of logic (now <50 lines) ...
def _validate_inference_inputs(self, family, team1, team2): ... # 15 lines
Estimated Time: 3 hours - Analyze and extract helpers: 1.5 hours - Test and verify: 1 hour - Add error handling: 30 min
Task 3.2: logo_generator.py (322 lines)
Current State: 322 lines, 1 long function
Long Function Identified:
1. generate_split_logo(): 99 lines (179-277) - Creates split diagonal team logos
Refactoring Approach:
For generate_split_logo() (99 lines):
# Current: 99-line monolith
def generate_split_logo(self, team1_info, team2_info, ...):
# ... 99 lines of image processing ...
# After: Extract image processing steps
def generate_split_logo(self, team1_info, team2_info, ...):
# Get cached or generate
cached = self._get_cached_logo(cache_key)
if cached:
return cached
# Generate new split logo
canvas = self._create_canvas()
logo1_img = self._load_team_logo(team1_info)
logo2_img = self._load_team_logo(team2_info)
mask = self._create_diagonal_mask(canvas.size)
result = self._composite_split_logos(canvas, logo1_img, logo2_img, mask)
return self._save_and_return(result, cache_key)
def _create_canvas(self): ... # 10 lines
def _load_team_logo(self, team_info): ... # 20 lines
def _composite_split_logos(self, canvas, logo1, logo2, mask): ... # 25 lines
def _save_and_return(self, image, cache_key): ... # 15 lines
Error Handling to Add:
- Wrap _download_image() with try/except (network failures)
- Handle PIL image processing errors
- Log failures with team info for debugging
Estimated Time: 2 hours - Extract image processing helpers: 1 hour - Add error handling: 30 min - Test with sample logos: 30 min
Task 3.3: match_debug_logger.py (459 lines)
Current State: 459 lines, 1 extremely long function
Long Function Identified:
1. _export_excel(): 181 lines (278-458) - Excel export with multiple sheets
Refactoring Approach:
For _export_excel() (181 lines) - Similar to Task 2.9 (analyze_mismatches.py):
# Current: 181-line monolith
def _export_excel(self):
# ... 181 lines creating 4-5 Excel sheets ...
# After: Extract sheet writers
def _export_excel(self):
wb = Workbook()
self._write_summary_sheet(wb)
self._write_channels_sheet(wb)
self._write_cache_attempts_sheet(wb)
self._write_db_queries_sheet(wb)
self._write_api_calls_sheet(wb)
wb.save(self.excel_path)
def _write_summary_sheet(self, wb): ... # 30 lines
def _write_channels_sheet(self, wb): ... # 35 lines
def _write_cache_attempts_sheet(self, wb): ... # 30 lines
def _write_db_queries_sheet(self, wb): ... # 30 lines
def _write_api_calls_sheet(self, wb): ... # 35 lines
Pattern: Same as analyze_mismatches.py Task 2.9 (excel_exporter.py)
Estimated Time: 2 hours - Analyze Excel structure: 30 min - Extract 5 sheet writers: 1 hour - Test Excel output: 30 min
Task 3.4: match_suggestions.py (382 lines)
Current State: 382 lines, 1 long function
Long Function Identified:
1. calculate_similarity(): 56 lines (250-305) - Multi-factor similarity calculation
Refactoring Approach:
For calculate_similarity() (56 lines):
# Current: 56-line monolith
def calculate_similarity(self, unmatched, event):
# ... 56 lines of similarity scoring ...
# After: Extract scoring components
def calculate_similarity(self, unmatched, event):
team_score = self._calculate_team_similarity(unmatched, event)
date_score = self._calculate_date_similarity(unmatched, event)
time_score = self._calculate_time_similarity(unmatched, event)
league_score = self._calculate_league_similarity(unmatched, event)
total_score = (team_score * 0.5 + date_score * 0.3 +
time_score * 0.1 + league_score * 0.1)
return min(total_score, 100.0)
def _calculate_team_similarity(self, unmatched, event): ... # 15 lines
def _calculate_date_similarity(self, unmatched, event): ... # 10 lines
def _calculate_time_similarity(self, unmatched, event): ... # 10 lines
def _calculate_league_similarity(self, unmatched, event): ... # 10 lines
Benefits: - Each similarity component independently testable - Easy to adjust weights (currently 50/30/10/10) - Clear separation of concerns
Estimated Time: 1.5 hours - Extract 4 similarity helpers: 1 hour - Test similarity scoring: 30 min
Task 3.5: provider_config_manager.py (474 lines)
Current State: 474 lines, 3 long functions
Long Functions Identified:
1. _fetch_from_db(): 119 lines (256-374) - Fetch config from D1
2. _load_from_cache(): 96 lines (159-254) - Load config from YAML cache
3. _save_to_cache(): 77 lines (376-452) - Save config to YAML cache
Refactoring Approach:
For _fetch_from_db() (119 lines):
# Current: 119-line monolith
def _fetch_from_db(self, provider_id):
# ... 119 lines fetching provider/channels/overrides ...
# After: Extract fetch operations
def _fetch_from_db(self, provider_id):
provider = self._fetch_provider_record(provider_id)
channels = self._fetch_provider_channels(provider_id)
overrides = self._fetch_provider_overrides(provider_id)
return self._assemble_provider_config(provider, channels, overrides)
def _fetch_provider_record(self, provider_id): ... # 25 lines
def _fetch_provider_channels(self, provider_id): ... # 30 lines
def _fetch_provider_overrides(self, provider_id): ... # 25 lines
def _assemble_provider_config(self, provider, channels, overrides): ... # 30 lines
For _load_from_cache() (96 lines):
# Current: 96-line monolith
def _load_from_cache(self, provider_id):
# ... 96 lines loading YAML cache ...
# After: Extract load operations
def _load_from_cache(self, provider_id):
cache_file = self._get_cache_file_path(provider_id)
if not cache_file.exists():
return None
yaml_data = self._read_yaml_cache(cache_file)
return self._parse_cached_config(yaml_data)
def _get_cache_file_path(self, provider_id): ... # 10 lines
def _read_yaml_cache(self, cache_file): ... # 15 lines
def _parse_cached_config(self, yaml_data): ... # 40 lines
For _save_to_cache() (77 lines):
# Current: 77-line monolith
def _save_to_cache(self, provider_id, config):
# ... 77 lines saving YAML cache ...
# After: Extract save operations
def _save_to_cache(self, provider_id, config):
cache_file = self._get_cache_file_path(provider_id)
yaml_data = self._serialize_config_to_yaml(config)
self._write_yaml_cache(cache_file, yaml_data)
def _serialize_config_to_yaml(self, config): ... # 35 lines
def _write_yaml_cache(self, cache_file, yaml_data): ... # 20 lines
Error Handling to Add: - Wrap D1 queries with try/except (database errors) - Handle YAML parsing errors - Handle file I/O errors - Log failures with provider ID
Estimated Time: 3.5 hours - Extract DB fetch helpers: 1.5 hours - Extract cache load/save helpers: 1 hour - Add error handling: 30 min - Test with sample provider: 30 min
Task 3.6: provider_orchestrator.py (394 lines)
Current State: 394 lines, 1 long function
Long Function Identified:
1. process_all_providers(): 89 lines (133-221) - Process all providers with retry logic
Refactoring Approach:
For process_all_providers() (89 lines):
# Current: 89-line monolith
def process_all_providers(self, date, ...):
# ... 89 lines of provider processing ...
# After: Extract processing steps
def process_all_providers(self, date, ...):
providers = self._get_active_providers_for_processing()
results = []
with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
futures = self._submit_provider_jobs(executor, providers, date, ...)
results = self._collect_provider_results(futures)
return self._summarize_processing_results(results)
def _get_active_providers_for_processing(self): ... # 15 lines
def _submit_provider_jobs(self, executor, providers, date, ...): ... # 25 lines
def _collect_provider_results(self, futures): ... # 30 lines
def _summarize_processing_results(self, results): ... # 20 lines
Error Handling to Add: - Wrap ThreadPoolExecutor with try/except - Handle provider timeout errors - Log concurrent processing failures
Estimated Time: 2 hours - Extract processing helpers: 1 hour - Add error handling: 30 min - Test with sample providers: 30 min
Task 3.7: scoped_team_extractor.py (313 lines)
Current State: 313 lines, 1 long function
Long Function Identified:
1. extract_team(): 94 lines (206-299) - Multi-scope team extraction
Refactoring Approach:
For extract_team() (94 lines):
# Current: 94-line monolith
def extract_team(self, text, ...):
# ... 94 lines of scoped regex matching ...
# After: Extract scope matching
def extract_team(self, text, league_hint=None, sport_hint=None, ...):
# Try league-scoped first
if league_hint:
match = self._try_league_scoped_extraction(text, league_hint)
if match:
return match
# Try sport-scoped
if sport_hint:
match = self._try_sport_scoped_extraction(text, sport_hint)
if match:
return match
# Fallback to global
return self._try_global_extraction(text)
def _try_league_scoped_extraction(self, text, league): ... # 25 lines
def _try_sport_scoped_extraction(self, text, sport): ... # 25 lines
def _try_global_extraction(self, text): ... # 30 lines
Estimated Time: 2 hours - Extract scope matching helpers: 1 hour - Test with sample teams: 1 hour
Task 3.8: enhanced_match_cache.py (304 lines)
Current State: 304 lines, NO long functions (longest is 42 lines)
Approach: Add error handling only (no function extraction needed)
Error Handling to Add:
1. store_match() (46 lines): Wrap cache writes with try/except
2. find_match() (42 lines): Handle missing cache keys
3. cleanup_expired() (23 lines): Handle concurrent cleanup
4. All methods: Add logging for debugging
Example:
# Before
def store_match(self, ...):
self._by_tvg_id[tvg_id] = cached_match
self._by_channel_name[channel_name] = cached_match
# After
def store_match(self, ...):
try:
self._by_tvg_id[tvg_id] = cached_match
self._by_channel_name[channel_name] = cached_match
logger.debug(f"Stored match for {channel_name}")
except Exception as e:
logger.error(f"Failed to store match: {e}")
raise
Estimated Time: 1 hour - Add try/except to 4 methods: 30 min - Add logging statements: 15 min - Test cache operations: 15 min
Implementation Strategy
Pattern: Function Extraction
Unlike Sprint 2's file splits, Sprint 3 uses function extraction:
When to Extract: - Function >50 lines - Clear logical sections (e.g., step 1, step 2, step 3) - Repeated code blocks - Complex nested logic
What to Extract: - Processing steps (fetch → parse → save) - Calculation components (team score + date score + ...) - Validation logic - Error handling blocks
What NOT to Extract: - Simple loops (<10 lines) - Single-purpose blocks already clear - Coordinator logic that ties steps together
ROI-Based Decisions
Skip extraction if: - Function is 50-60 lines but already clear - Extraction would create more complexity than it removes - Function is a coordinator that legitimately ties many steps
Enhanced match cache (Task 3.8) is an example: no long functions, just needs error handling.
Time Estimates
Per-Task Breakdown
| Task | File | Extraction | Error Handling | Testing | Total |
|---|---|---|---|---|---|
| 3.1 | family_league_inference.py | 1.5h | 0.5h | 1h | 3h |
| 3.2 | logo_generator.py | 1h | 0.5h | 0.5h | 2h |
| 3.3 | match_debug_logger.py | 1h | 0h | 0.5h | 1.5h |
| 3.4 | match_suggestions.py | 1h | 0h | 0.5h | 1.5h |
| 3.5 | provider_config_manager.py | 2.5h | 0.5h | 0.5h | 3.5h |
| 3.6 | provider_orchestrator.py | 1h | 0.5h | 0.5h | 2h |
| 3.7 | scoped_team_extractor.py | 1h | 0h | 1h | 2h |
| 3.8 | enhanced_match_cache.py | 0h | 0.5h | 0.5h | 1h |
| Total | 8 files | 9h | 2.5h | 5h | 16.5h |
Estimated Duration: 2-3 days (with buffer)
Success Criteria
Code Quality
✅ All functions <50 lines - Zero functions exceeding 50 lines ✅ Error handling added - All risky operations wrapped with try/except ✅ Logging improved - Debug/error logging for troubleshooting ✅ All imports passing - No broken imports after refactoring ✅ Backward compatibility - 100% compatible with existing code
Testing
✅ Existing tests passing - All tests continue to pass ✅ Manual testing - Test key workflows with sample data ✅ Import verification - Verify all imports work
Documentation
✅ Completion reports - Create task completion .md for each file ✅ Code comments - Add docstrings to extracted helpers
Risk Mitigation
Medium-Risk Areas
- logo_generator.py: PIL image processing can fail in unexpected ways
-
Mitigation: Comprehensive error handling + test with real logos
-
provider_config_manager.py: Supabase database + YAML caching is complex
-
Mitigation: Test with staging Supabase database first
-
provider_orchestrator.py: ThreadPoolExecutor concurrency issues
- Mitigation: Add timeout handling + test with multiple providers
Low-Risk Areas
- family_league_inference.py: Pure logic, no external dependencies
- match_suggestions.py: Simple similarity calculations
- scoped_team_extractor.py: Regex matching (well-tested)
- enhanced_match_cache.py: In-memory cache (simple)
Dependencies
No external blockers - All work is internal refactoring
Internal dependencies: - Sprint 2 completion (✅ Done) - Staging Supabase database access (for Task 3.5 testing) - Sample provider data (for Task 3.6 testing)
Next Steps
Week 9 Execution
- Day 1: Tasks 3.1, 3.2 (5 hours)
- Day 2: Tasks 3.3, 3.4, 3.8 (4 hours)
- Day 3: Tasks 3.5, 3.6, 3.7 (7.5 hours)
Total: ~16.5 hours over 3 days
Week 10 Preview
After Week 9 completion, proceed to Sprint 3 Week 10 (Batch 3B): - enhanced_event_matcher.py (363L) - enhanced_team_matcher.py (460L) - database/connection.py (369L) - database/migration_runner.py (386L) - parsers/provider_m3u_parser.py (370L) - clients/espn_api_client.py (396L) - clients/tv_schedule_client.py (461L)
Total Batch 3B: 7 files (~2,800 lines)
Appendix
Long Functions Summary
By Severity:
- Critical (>100 lines): 3 functions
- match_debug_logger._export_excel(): 181 lines
- provider_config_manager._fetch_from_db(): 119 lines
- logo_generator.generate_split_logo(): 99 lines
- High (75-100 lines): 3 functions
provider_orchestrator.process_all_providers(): 89 linesscoped_team_extractor.extract_team(): 94 lines-
provider_config_manager._load_from_cache(): 96 lines -
Medium (50-75 lines): 5 functions
family_league_inference._infer_from_event_context(): 78 linesfamily_league_inference._infer_from_teams(): 74 linesprovider_config_manager._save_to_cache(): 77 linesfamily_league_inference.infer_leagues(): 63 linesmatch_suggestions.calculate_similarity(): 56 lines
Total: 11 long functions across 7 files
Plan Version: 1.0 Created: 2025-11-05 Sprint 2 Status: ✅ Complete Sprint 3 Week 9 Status: 📋 Ready for Execution Next Action: Begin Task 3.1 (family_league_inference.py)